-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add utitlity to visualize datapipe graphs #330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noob question: Wondering do we have a potential extensibility using graphviz to visualize the attributes for each Node?
Sure. Everything we put in from torchdata.datapipes.iter import IterableWrapper
def no_map(x):
return x
def no_filter(x):
return True
dp1, dp2, dp3 = IterableWrapper([]).fork(3)
dp1 = dp1.map(no_map)
dp2 = dp2.filter(no_filter)
dp = dp1.zip(dp2, dp3) into This is why I suggested #284 (comment). This would reduce |
from torchdata.datapipes.iter import IterDataPipe, IterableWrapper
class CustomIterDataPipe(IterDataPipe):
def noop(self, x):
return x + 1
def __init__(self):
self._dp = IterableWrapper([1, 2, 4]).map(self.noop)
def __iter__(self):
yield from self._dp Due to the circular reference, we either need to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
@NivekT What do you think? We have inline doc for to_graph
. Do we need an example for users shown in README or another place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting back to this earlier. Overall, it looks very good to me!
I tried playing around with it to see if there are cases where the output is not sensible. This is the only case where things may be confusing, and I think we may want to show the ChildDataPipe
(perhaps renamed to something else like DemuxChild1
or something). Admittedly, I think the code snippet below may not be representative of the usual use cases of DataPipes in a pipeline:
from torchdata.datapipes.iter import IterableWrapper
from torchdata.datapipes.utils import to_graph
dp = IterableWrapper(range(10))
dp = dp.map(lambda x: x + 1)
dp = dp.filter(lambda _: True)
dp1, dp2 = dp.demux(num_instances=2, classifier_fn=lambda x: x % 0 == 0)
dp3 = dp1.zip(dp2)
g = to_graph(dp3)
g.view()
We can also apply the function on a few torchvision
data pipelines and see what they look like.
I wrote a small script to test this on the very easyeasymediumdifficult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a small script to test this on the
torchvision
datasets. Assuming this PR is merged and you have a nightly fromtorchdata
andtorchvision
installed, this is the result.very easy
CIFAR10
easy
MNIST
medium
VOC
difficult
COCO
CelebA
I wrote a small script to test this on the
torchvision
datasets. Assuming this PR is merged and you have a nightly fromtorchdata
andtorchvision
installed, this is the result.very easy
CIFAR10
easy
MNIST
medium
VOC
difficult
COCO
CelebA
Those graphs look complicated but sensible to me. Unless there are ways to simplify the pre-processing logic, I think they look good. I think the remaining part is the edge case with _ChildDataPipes
combining back together immediately.
@NivekT I've added a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is good to land as it is. I will follow up with a different PR of documentations.
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…507) * add utility to visualize datapipe graphs (#330) Summary: Closes #299. Pull Request resolved: #330 Reviewed By: ejguan Differential Revision: D36990978 Pulled By: NivekT fbshipit-source-id: 423a52e6d59da59826d1422676d28287537dd902 * Update examples and tutorial after automatic sharding has landed (#505) Summary: Pull Request resolved: #505 After the `DataLoader` automatic sharding for DataPipe PR has landed in core, this PR is updating the relevant example and tutorial Test Plan: Imported from OSS Reviewed By: VitalyFedyunin, ejguan Differential Revision: D37040383 Pulled By: NivekT fbshipit-source-id: 446b9e6a0ae804b1a0908a26058680f9b3c2f080 Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Closes #299.